Fast fourier transform processor

ABSTRACT

A Fast Fourier Transform (FFT) processor is provided. It comprises a multiplexer, a first angle rotator, a second angle rotation and multiplexing unit, an adder, a twiddle factor storage, a multiplier, and a data storage. The FFT processor analyzes the input/output order of the Fast Fourier Transformation, separates the portions requiring complex computations, simplifies the hardware thereof, and adjusts the output order. It not only effectively saves the hardware area, but also reduces the computations and memory access count. Thereby, the power consumption is reduced.

FIELD OF THE INVENTION

The present invention generally relates to Fast Fourier transform (FFT),and more specifically to a Fast Fourier Transform processor.

BACKGROUND OF THE INVENTION

As the mobile communication becomes more ubiquitous, the bandwidthdemands of wireless local area network (WLAN) also increases. In theIEEE802.11a specification, proposed to meet the demands, the FastFourier Transform (FFT) computation unit is an important role ofmodulation. The FFT computation unit is able to transform the data onthe frequency domain into corresponding data on the time domain. Thisfeature allows improvement of the signal attenuation and multi-pathinterference problems often faced in the wireless communication.Therefore, the present and future communication specifications willcontinue to utilize FFT computation. However, the wireless communicationhardware must be able to support the large amount of computation.

The structures of conventional FFT circuitry are categorized into threetypes: single-memory, dual-memory, and pipeline. The single-memorystructure uses only one computation unit and utilizes the in-placecomputation feature of the FFT; therefore, it uses the smallestcircuitry area. However, this type of structure has the disadvantage ofhigh computational latency. The dual-memory structure uses a memory tostore input and the other to store output; therefore, it provides ahigher throughput than the single-memory structure. Nevertheless, ittakes Log_(r)N computation units (r, N are positive integers), andrequires the largest circuitry area.

Discrete Fourier Transform (DFT) is defined as follows:${X\lbrack k\rbrack} = {\sum\limits_{n = 0}^{N - 1}\quad{{x\lbrack n\rbrack}w_{N}^{kn}}}$

Where k=0,1, . . . ,N-1, n=0,1, . . . , N-1, and W_(N)=e^(−j2 π/N) is atwiddle factor.

FIG. 1 shows a schematic view of a conventional FFT processor structure.In 2002, Lenart proposed, in “A Pipelined FFT Processor Using DataScaling with Reduced Memory Requirements” (Proc. Of Norchip, Nov. 11-12,2002, Copenhagen, Denmark), a pipeline structure using a base of fourfor FFT processor. The proposed structure takes three stages to process64 points, and requires three multipliers and accesses the rotationfactors three times. There are up to six first-in-first-out (FIFO)buffers to access in each clock.

FIG. 2 shows a schematic view of another conventional FFT processorstructure. In 2003, Maharatna proposed, in “A Novel 64-Point FFT/IFFTProcessor for IEEE802.11a Standard” (ICASSP 2003), FFT processorstructure using a base of 8. As shown in FIG. 2, the structure processesthe 64-point FFT computation by expanding two 8-based butterflytypecomputation units and eight output units with specialized hardwareconnection. Although this structure reduced the latency, it requiresmore hardware computation units.

FIG. 3 shows a schematic view of yet another conventional FFT processorstructure. In 2002, Guo proposed, in “A New Hardware-Efficient DesignApproach for the 1D Discrete Fourier Transform” (Pattern Recognition andImage Analysis, Vol. 12, No. 3, 2002, pp. 299-307), a one-dimensionalstructure for a DFT processor. As shown in FIG. 3, the structureseparates the computation of odd part and the even part, and requires alarger hardware circuitry area.

There are numerous structures for conventional FFT processors. Theobjects of an FFT processor are to use the least hardware area and cost,have the least time delay, and consume least energy. The aforementionedstructures have the disadvantages of frequent accesses to memory, largeamount of multiplication computations, and requiring a large number ofcomputational units.

SUMMARY OF THE INVENTION

The present invention has been made to overcome the aforementioneddrawback of conventional FFT processors. The primary object of thepresent invention is to provide an FFT processor, comprising amultiplexer, a first angle rotator, a second angle rotation andmultiplexing unit, an adder, a twiddle factor storage, a multiplier, anda data storage. The processor reduces the hardware circuitry area, cost,as well as the technical complexity.

According to the present invention, the multiplexer is to select aninput set of N data items from a plurality of N-item sets, and outputs aset of N data items. N is an M-th power of 2, and M is an integergreater than or equal to 3. The first angle rotator receives the N/2data items from the N-item set, rotates the received N/2 data items fora first angle, and outputs the N/2 rotated data items sequentially. Thesecond angle rotation and multiplexing unit receives a set of N dataitems and N/2 rotated data items. The second angle rotation andmultiplexing unit must either select the N data items within a firstpreset duration or select N/2 data items from the N data items tocombine with the rotated N/2 data items within the second presetduration, and rotates them for a second angle. Finally, the second anglerotation and multiplexing unit outputs the rotated N data itemssequentially.

The adder adds the N rotated data items sequentially, and outputs a sumin frequency domain of the N rotated data items. The twiddle factorstorage stores all the twiddle factors of an N-point FFT. The multipliermultiplies the sum in frequency domain with the corresponding twiddlefactor sequentially, and outputs a mean data. The storage receives andstores the mean data sequentially, and outputs N data items to themultiplexer for the next stage computation.

The FFT processor of the present invention further includes a firstregister array and a second register array. The first register array islocated between the multiplexer and the second angle rotation andmultiplexing unit. The second register array is located between thefirst angle rotator and the second angle rotation and multiplexing unit.

The present invention is to analyze the input/output order of thecomputations in the FFT, extract the part that requires complexcomputation for simplifying hardware, and adjust the output order. Thisnot only reduces the hardware circuitry area, cost, and technicalcomplexity, but also reduces the computation, memory accesses and energyconsumption.

The foregoing and other objects, features, aspects and advantages of thepresent invention will become better understood from a careful readingof a detailed description provided herein below with appropriatereference to the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows a schematic view of the structure of a conventional FFTprocessor.

FIG. 2 shows a schematic view of the structure of another conventionalFFT processor.

FIG. 3 shows a schematic view of the structure of yet anotherconventional FFT processor.

FIG. 4 shows a schematic view of the structure of a FFT processoraccording to the invention.

FIG. 5 shows a table of the relation between the input and the output ofthe DFT.

FIG. 6 shows the re-arranged computation order of the DFT.

FIG. 7 shows a table of the relation between the input and the output ofthe DFT after extracting W(1,8) and re-arranging the order.

FIG. 8A shows the addition of a register array in FIG. 4 for temporarystorage of input and output.

FIG. 8B shows the storing and the sequence arrangement of the data flowwithin the first preset duration according to the present invention.

FIG. 8C shows the storing and the sequence arrangement of the data flowwithin the second preset duration according to the present invention.

FIG. 9 shows a table of the twiddle factors in a 64-point FFT.

FIG. 10 shows a hardware structure of the first angle rotator of thepresent invention using N=8 as an embodiment.

FIG. 11 shows the functional comparison between the FFT processor of thepresent invention and conventional FFT processors.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

FIG. 4 shows a schematic view of the structure of a FFT processor of thepresent invention. Referring to FIG. 4, FFT processor 400 comprises amultiplexer 41, a first angle rotator 42, a second angle rotation andmultiplexing unit 43; an adder 44, a twiddle factor storage 45, amultiplier 46 and a storage 47. Multiplexer 41 selects an input set of Ndata items from a plurality of N-item sets, and outputs a set of N dataitems. N is an M-th power of 2, and M is an integer greater than orequal to 3. First angle rotator 42 receives N/2 data items from theN-item set, rotates the received N/2 data items for a first angle, andoutputs the N/2 rotated data items sequentially. Second angle rotationand multiplexing unit 43 receives a set of N data items and the N/2rotated data items. Second angle rotation and multiplexing unit 43 musteither select the N data items within a first preset duration or selectN/2 data items from the N data items to combine with the rotated N/2data items within the second preset duration, and rotates them for asecond angle. Finally, second angle rotation and multiplexing unit 43outputs the rotated N data items sequentially.

Adder 44 adds the N rotated data items sequentially, and outputs a sumin frequency domain of the N rotated data items. Twiddle factor storage45 stores all the twiddle factors of an N-point FFT. Multiplier 46multiplies the sum in frequency domain of the rotated data with thecorresponding twiddle factor sequentially, and outputs a mean data.Storage 47 receives and stores the mean data sequentially, and outputs Ndata items to multiplexer 41 for the next stage computation.

Without the loss of generality, the following description uses N=8 toexplain the structure and the operation of the present invention.

FIG. 5 shows a table of the relation between the input and the output ofthe Discrete Fourier Transform (DFT), and FIG. 6 shows the re-arrangedcomputation order of the DFT. As shown in FIG. 5, the correspondinginput will require the complex multiplication W(1,8)=e^(−j2π/8) onlywhen the sequence order is 1, 3, 5 and 7. Therefore, by re-arranging thesequence order as shown in FIG. 6, the original outputs for sequenceorder 1, 3, 5, and 7 which require rotation become the outputs forsequence order 4, 5, 6, and 7. By postponing the output of the four dataitems, the multiplication computation of the complex number W(1,8) ofthe four data items can be performed and the results can be stored inthe registers at the same time when the first four data items are beingoutputted. When computing the fifth output, the stored data items can beused as input because the multiplication has been completed.

FIG. 7 shows the relation between the input and the output of the DFTafter extracting W(1,8) and re-arranging the order. The table shown inFIG. 7 can be obtained by separating the W(1,8) complex multiplicationwith a butterfly-type FFT processor having a base of 8. The remainingcomputation can be completed with a butterfly-type FFT processor havinga base of 4. The present invention does not use the algorithm with rootvalue 2² to realize the butterfly-FFT because the memory arrange is notcompatible and the butterfly-FFT having a base 4 can reduce thecomputation delay with a carry save adder (CSA) structure.

Therefore, the actual computation can be divided into two categories.The first category includes the four outputs (X(1),X(3),X(5),X(7)) whichmust past the W(1,8)=2π/8 angle rotator and a butterfly-type FFTprocessor having a base 4. The second category includes the four outputs(X(0),X(2),X(4),X(6)) which only requires passing a butterfly-type FFTprocessor having a base 4 without passing an angle rotator. Byre-arrangement of the time sequence, the time for computing the FFT forthe second category can be used to compute the rotation W(1,8) for thefirst category. In this way, the efficiency of the computation units,such as for twiddle factor multiplication and the adder forbutterfly-type FFT having a base of 8, is improved.

FIG. 8A adds register arrays to the structure shown in FIG. 4 forproviding the storage of input and the output. As shown in FIG. 8A, anFFT processor 800 uses a storage 47, a first register array 88, and asecond register array 89 to provide the bandwidth required by the inputand the storage required by the output. First angle rotator 42 isrealized with a shifter and an adder (will be discussed in FIG. 10). Byutilizing the concept of time-sharing, the hardware circuitry is reducedby two-third. Second angle rotator 43 is realized with the cross-lineand part of an inverter. The compensation items required in the 2-basedcomplimentary computation are left to adder 44 for finishing thecomputation. This will greatly reduce the power consumption caused bythe carry item.

FIG. 8B and FIG. 8C show the storing and the sequence arrangement of thedata flow within the first and the second preset durations according tothe present invention, respectively.

FIG. 8B shows the storing and the sequence arrangement of the data flowwithin the first preset duration according to the present invention.Multiplexer 41 selects a set from two 8-item sets for input, and outputsan 8-item set. As shown in FIG. 8B, second rotating and multiplexingunit 43 receives an 8-item set from first register array 88 (shown assolid line) within the first preset duration. When second rotating andmultiplexing unit 43 and adder 44 are computing frequency domain dataX(0), X(2), X(4), and X(6), first angle rotator 42 receives (shown asdash line) the other four items X(1), X(3), X(5), and X(7) for rotating2π/8, and then stores the 2π/8-rotated items into second register array89.

FIG. 8C shows the storing and the sequence arrangement of the data flowwithin the second preset duration according to the present invention. Asshown in FIG. 8C, second rotating and multiplexing unit 43 selects theremaining four items (X(0), X(2), X(4), and X(6)) to combine with thefour rotated items (shown as solid line) to form the input within thesecond preset duration. The rotation of 2π/4 is performed on the input 8items, and the results are outputted in a sequential order. Adder 44sequentially adds the 8 rotated items and outputs the frequency domaindata X(1), X(3), X(5), and X(7).

Take N=8 as an example. The present invention uses an FFT having a baseof 8. As pointed out by Yeo (Low Power Implementation of FFT/IFFTProcessor for IEEE 802.11a Wireless LAN Transceiver), the design usingthe base 8 consumes the least amount of power. In applications, IEEE802.11a specification demands a 64-point FFT structure. Therefore, thepresent invention uses two cycles of computation to implement the64-point FFT, that is, two stages of computation. This not only savesthe computation, but also saves the memory access (twiddle factorstorage 45 and storage 47); therefore, the power consumption is reduced.

The output of adder 44 is the end of the first stage. To implement the64-point FFT, the corresponding twiddle factors must be read fromtwiddle factor storage 45. FIG. 9 shows a table for the twiddle factorsfor 64-point FFT. The frequency domain data X(0)-X(8) are required to bemultiplied by the twiddle factor W(0,64)=e^(−(j2π/64))*0)=1, and X(9)must be multiplied by the twiddle factor W(1,64)=e^(−(j2π/64)*1), X(63)must be multiplied by the twiddle factor W(49,64)=e^(−(j2π/64)*49), andso on.

In the 64-point FFT, the first batch of frequency domain data X(0),X(8), X(16), X(24), X(32), X(40), X(48), and X(56) outputted by adder 44of the first stage are multiplied by the corresponding twiddle factorsin the multipliers, and stored in storage 47. After the entire 64 pointsfinish the first stage computation, the second stage computation willstart.

In general, one of the major difficulties in implementing an 8-based FFTcircuitry is the rotation of the 2π/8 angle. This rotation requires tworeal-number multipliers and a real-number adder. The rotation can beexpressed with the twiddle factor in the FFT and become thecomplex-number multiplication${W\left( {1,8} \right)} = {{\mathbb{e}}^{- {({j\frac{2\quad\pi}{8}})}}.}$However, in the structure of the present invention, the simplificationof the W(1,8) becomes:${a \times {\mathbb{e}}^{- {({j\frac{2\quad\pi}{8}})}}} = {{a \times \left( \frac{\sqrt{2}}{2} \right) \times \left( {1 + j} \right)} \cong {a \times \left( {2^{- 1} + 2^{- 3} + 2^{- 4} + 2^{- 6} + 2^{- 8} + 2^{- 9}} \right) \times \left( {1 + j} \right)}}$which can be further re-arranged as:${{a \times {\mathbb{e}}^{- {({j\frac{2\quad\pi}{8}})}}} \cong {a \times \left( {2^{- 1} + 2^{- 3} + 2^{- 4} + 2^{- 6} + 2^{- 8} + 2^{- 9}} \right)}} = {\left( {a \times \left( {2^{- 1} + 2^{- 3} + 2^{- 4}} \right)} \right) \times \left( {1 + 2^{- 5}} \right) \times \left( {1 + j} \right)}$By doing this, the complex-number multiplication is simplified ascomputation that can be performed by a shift adder; thus, this improvesthe hardware circuitry area and the computation delay.

FIG. 10 is the hardware structure of the first angle rotator of thepresent invention, with N=8 as an example. As shown in FIG. 10, thefirst angle rotator of the present invention implements the last stageof summation with the carry save adder (CSA) structure so as to avoidthe delay and the power consumption caused by the carry operation.Because the delay caused by the carry ripple adder (CRA) is proportionalto the number of bits of the input, and the delay of each stage in theCSA is a fixed amount of time, the structure can accomplish thecomputation faster.

FIG. 11 shows the hardware structure and functionality comparisonbetween the FFT processor of the present invention and conventionaltechniques. As shown in the table, the FFT processor of the presentinvention has the advantages of simple control mechanism, efficientcomputation, requiring less computation units and requiring less memory.

In summary, the FFT processor of the present invention uses thetemporary register storage for the intermediate computation results andre-arranges the sequence order of each operand so that the requiredoperands can be pre-loaded into the registers. Therefore, the higherbandwidth of the registers can be achieved.

Although the present invention has been described with reference to thepreferred embodiments, it will be understood that the invention is notlimited to the details described thereof. Various substitutions andmodifications have been suggested in the foregoing description, andothers will occur to those of ordinary skill in the art. Therefore, allsuch substitutions and modifications are intended to be embraced withinthe scope of the invention as defined in the appended claims.

1. A Fast Fourier Transform (FFT) processor, comprising: a multiplexerfor selecting an input set of N data items from a plurality of N-itemsets and outputting a set of N data items, N being an M-th power of 2, Mbeing an integer greater than or equal to 3; a first angle rotator forreceiving N/2 data items from said N-item set, rotating said receivedN/2 data items for a first angle, outputting said rotated N/2 data itemssequentially; a second angle rotation and multiplexing unit forreceiving a set of N data items and said rotated N/2 data items, theneither selecting said N data items within a first preset duration orselecting N/2 data items from said N data items for combining with saidrotated N/2 data item within a second preset duration, rotating saidresulted N data items for a second angle, outputting said rotated N dataitems sequentially; an adder for adding said N rotated data itemssequentially, and outputting a sum in frequency domain of said N rotateddata items; a twiddle factor storage for storing all the twiddle factorsof an N-point FFT; a multiplier for multiplying said sum in frequencydomain with said corresponding twiddle factor sequentially, andoutputting a mean data; and a storage for receiving and storing saidmean data sequentially, and outputting said N data items to saidmultiplexer.
 2. The FFT structure as claimed in claim 1, furthercomprising: an array of N registers for receiving said N data items fromsaid multiplexer, outputting said N data items within said first presetduration, and outputting said N/2 data items not being said first anglerotator within said second present duration; and an array of N/2registers for receiving said rotated N/2 data items from said firstangle rotator, and outputting said N/2 data items within second presetduration.
 3. The FFT processor as claimed in claim 1, wherein said firstangle rotator rotates by 2π/N.
 4. The FFT processor as claimed in claim1, wherein said first angle rotator rotates by a fixed angle.
 5. The FFTprocessor as claimed in claim 1, wherein said second angle rotatorrotates by 2π/(N/2).
 6. The FFT processor as claimed as in claim 5,wherein the rotation angle of said second angle rotator is a multiple ofsaid second angle.
 7. The FFT processor as claimed in claim 1, whereinsaid N equals
 8. 8. The FFT processor as claimed in claim 1, whereinsaid first preset duration is ealier than said second preset duration.9. The FFT processor as claimed in claim 1, wherein the N/2 items ofsaid N items received by said first angle rotator have the sequenceorder as 1, 3, 5, 7, . . . , (N-1).